Improving Probabilistic Record Linkage Using Statistical Prediction Models
نویسندگان
چکیده
Summary Record linkage brings together information from records in two or more data sources that are believed to belong the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations challenge is link units subject error. We provide an overview record techniques specifically investigate classic Fellegi Sunter probabilistic framework assess whether decision rule for classifying pairs into sets matches non‐matches be improved by incorporating prediction model. also study enhanced better results terms preserving associations between variables linked file not used procedure. A simulation application real evaluate methods.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملProbabilistic record linkage
Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a 'black box' research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the c...
متن کاملImproving Temporal Record Linkage Using Regression Classification
Temporal record linkage is the process of identifying groups of records that are collected over a period of time, such as in census or voter registration databases, where records in the same group represent the same real-world entity. Such databases often contain temporal information, such as the time when a record was created or when it was modified. Unlike traditional record linkage, which co...
متن کاملValidating Distance-Based Record Linkage with Probabilistic Record Linkage
This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the num...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Statistical Review
سال: 2022
ISSN: ['0306-7734', '1751-5823']
DOI: https://doi.org/10.1111/insr.12535